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The  human  visual  system  has  the  ability  to  utilize  motion  information  to  infer 
the  shapes  of  surfaces.  More  specifically,  we  are  able  to  derive  descriptions 
of  rigidly  rotating  smooth  surfaces  entirely  from  the  orthographic  projection 
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on  that  surface  are  uniquely  determined  up  to  a  reflection.  The  computational 
analysis  proceeds  in  three  main  steps.  First  it  is  shown  that  surface  tilt  and 
one  component  of  the  angular  velocity  may  be  obtained  entirely  from  the  first 
spatial  derivatives  of  the  velocty  field.  Second  it  is  shown  that  surface  slant 
and  the  remaining  two  components  of  the  angular  velocity  arc-  computable  if  the 
first  spatial  derivatives  of  the  acceleration  field  are  also  given.  Finally 
the  problem  of  constructing  a  velocity  field  from  the  temporally  changing  optic 
array  is  briefly  discussed. 
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ABSTRACT 

The  human  visual  system  has  the  ability  to  utilize  motion  information  u»  infer  the  shapes 
of  surfaces.  More  specifically,  we  are  able  to  derive  descriptions  of  rigidly  rotating  smooth 
sin  faces  entirely  from  the  orthographic  projection  of  the  motions  of  their  surface  mark¬ 
ings. 

A  computational  analysis  of  (his  ability  is  proposed  based  on  a  “shape  from  motion"  proposition, 
’litis  proposition  states  that  gi-.cn  the  first  spatial  derivatives  of  the  orthographical!}  projected 
velocity  and  acceleration  fields  of  a  rigidly  rotating  regular  surface,  then  the  angular  velocity 
and  the  surface  normal  at  each  visible  point  on  that  surface  arc  uniquely  determined  up  to  a 
reflection. 

The  computational  analysis  proceeds  in  three  main  steps.  First  it  is  shown  that  surface  tilt  and 
one  component  of  the  angular  velocity  may  be  obtained  entirely  from  the  first  spatial  derivatives  of 
the  velocity  field.  Second  it  is  shown  that  surface  slant  and  live  remaining  two  components  of  the 
angular  velocity  are  computable  if  the  first  spatial  derivatives  of  the  acceleration  field  arc  also  given. 
Finally  the  problem  of  eonstt  ucting  a  velocity  field  from  the  temporally  changing  optic  array  is  briefly 
discussed. 
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1.  Introduction 

Visual  motion  provides  a  powerful  base  for  inferences  about  the  layout  of  the  immediate  environ¬ 
ment  and  the  motions  of  the  various  constituents  of  that  environment.  The  focus  of  this  paper  is 
one  inference  that  the  human  visual  system  does  appear  tti  perform  routinely  based  on  visual  motion 
alone.  In  particular,  the  human  visual  system  has  a  remarkable  ability  to  utilize  motion  information  to 
infer  the  three  dimensional  shapes  of  surfaces.  More  specifically,  wc  are  able  to  derive  correct  descrip¬ 
tions  of  rigidly  rotating  smooth  surfaces  entirely  from  the  orthographic  projection  of  the  motions  of 
their  surface  markings. 

A  demonstration  of  this  ability,  similar  to  Unit  of  Uilman  (1979),  is  illustrated  in  figure  1.  Dots 
arc  randomly  placed  on  a  sphere  in  the  memory  of  a  computer.  Successive  snapshots  of  this  random 
dot  sphere  arc  generated  at  five  degree  intervals  and  orthographical!)'  projected  in  quick  temporal 
succession  (using  an  ISI  of  20  msec  and  a  presentation  time  per  frame  of  20  msec)  on  a  computer 
driven  crt.  Figure  la  shows  three  successive  frames  as  they  would  appear  statically  on  the  crt.  As 
is  obvious  from  the  figure  each  individual  frame  gives  no  impression  of  being  a  sphere.1  Rather  it 
just  looks  like  a  somewhat  circular  array  of  random  dots.  How  ever,  when  the  frames  arc  presented 
in  quick  temporal  succession  one  obtains  a  compelling  perception  of  a  smooth  sphere  in  rotation  (see 
figure  lb). 

It  is  important  to  note  that  die  perception  is  of  a  smooth  spherical  surface,  not,  for  example,  of 
invisible  wires  connecting  the  individual  dots  as  in  Johannson’s  biological  motion  (Johannson,  1973). 
One  has  the  feeling  that  there  is  an  almost  tangible  smooth  black  pearl  with  little  lights  attached  to 
its  surface.  The  importance  of  noting  this  is  diat  it  indicates  the  type  of  description  dial  appears  to 
be  built  by  the  visual  system.  It  is  a  description  whose  primitives  relate  to  surfaces  rather  dian  to 
positions  of  isolated  points.2 

That  this  visual  ability  is  a  nontrivial  feat  becomes  apparent  when  it  is  realized  that  the  mapping 
from  the  environment  onto  the  retina  is  many  to  one.  The  information  available  to  die  visual  system 
undcrdctcrmincs  the  surface  which  is  the  source  of  the  motion  observed,  so  diat  any  conclusions 
drawn  about  that  surface  arc  in  principle  nondcmonstrativc.  Yet,  surprisingly,  our  perception  is,  in 
general,  of  a  unique  surface  in  rotation.  More  surprisingly,  it  is  more  often  dian  not  correct.  Clearly 
the  visual  system  must  be  utilizing  generally  valid  constraints  about  the  nature  of  surfaces  and  objects 
in  our  world  in  order  to  obtain  this  unique  solution.  One  constraint  of  central  importance  in  obtaining 
a  unique  surface  is  the  rigidity  constraint;  the  environment  is  usually,  diough  not  always,  composed  of 
rigid  objects  (Uilman,  1979;  Johansson.  1964  &  1975;  Hay,  1966;  Green  1961;  YVallach  &  O’Connell, 
1953;  Gibson  &  Gibson,  1957).  Liter  this  constraint  will  be  given  a  precise  mathematical  formulation 
and  its  utility  in  arriving  at  a  unique  interpretation  clearly  illustrated. 

flic  goal  of  this  paper  is  to  provide  a  description  of  this  perceptual  ability  at  a  level  which  Marr 
and  I’oggio  have  called  a  computational  theory  (Marr  &  Poggio,  1977).  The  computational  analysis 
proposed  is  based  on  a  “shape  from  motion"  proposition3  which  states  that  given  the  first  spatial 
1  This  eliminates  single  frame  information  such  as  texture  gradients  from  being  a  plausible  explanation  'his  ability. 

JThis  docs  not  discount  the  possibility,  of  course,  that  positions  of  points  might  be  computed  first  and  m  .noth  surfaces 
fitted  through  them  afterwards  In  fact,  just  such  a  scheme  appears  to  be  utilized  in  stereo  vision  (Crimson,  1980). 

3lhc  term  "proposition"  is  not  intended  to  imply  any  hubrislic  claims  regarding  the  complexity  of  this  result  or  its 
derivation  Rather  :  is  intended  to  emphasize  that  Uic  present  inquiry  is  a  computational  analysis. 


Figure  2.  Surface  representation.-!  using  slant,  a.  and  till,  r.  Rather  than  representing  the  surface  normal  at  a 
point  in  tern.-,  of  surface  gradients  (~r  and  ;,()  it  is  convenient  to  adopt  the  slam  and  tilt  convention  proposed 
by  both  Stevens  ( I’JRO)  and  Attneave  ( 1972)  llncdy.  tilt  indicates  in  which  direction  a  surface  is  rotated  from 
the  observer's  frontal  plane  and  slant  indicates  how  much  it  is  rotated  away  from  the  frontal  plane  in  that 
direction  Whereas  surface  gradients  tend  to  infinity  at  occluding  contours,  tilt  ranges  only  between  0  and  180 
degrees  and  slant  ranges  from  0  to  90  degrees  Hie  equations  of  transformation  are  a  —  tan- *  \JZ\  + 
and  r  =  tan-1  \f*v/Zz- 

derivatives  of  the  orthographically  projected  velocity  and  acceleration  fields  of  a  rigidly  rotating 
regular  surface,  its  angular  velocity  and  die  surface  normal  at  each  visible  point  on  die  surface  are 
uniquely  determined  up  to  a  reflection  about  the  image  plane. 

l  or  clarity  the  computational  analysis  is  presented  in  Uircc  main  steps.  First  it  is  shown  that  surface 
tilt  (see  figure  2)  and  one  component  of  die  angular  velocity  may  be  obtained  from  the  first  spatial 
derivatives  of  the  velocity  field.  I  non  it  is  shown  that  surface  start  and  the  remaining  two  components 
of  the  angul.tr  velocity  arc  computable  if  the  first  spatial  dcrivai  ves  of  the  acceleration  field  are  also 
given.  Finally,  since  die  computational  analysis  assumes  as  one  of  its  givens  a  velocity  field,  the 
problem  ofet instructing  a  velocity  field  from  the  temporally  changing  optic  arrav  i-  discussed  briefly. 
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2.  Two  Previous  Computational  Analyses 

The  ability  of  the  human  visual  system  to  infer  the  correct  three  dimensional  description  of  an  ob¬ 
ject  from  its  projected  motion  alone  has  been  investigated  computationally  several  times  before.  Two 
of  these  previous  analyses  will  he  briefly  discussed  to  illustrate  die  two  basic  types  of  computational 
approaches  that  can  he  taken  to  this  problem  and  the  two  basic  types  of  resulting  descriptions. 

I 'liman  (l')7(J)  took  what  may  be  called  the  "discrete  approach"  to  the  problem,  llic  givens 
for  hi  computational  analysis  are  three  successive  snapshots  of  isolated  points  moving  in  a  rigid 
configuration.  I  he  resulting  description  he  builds  is  essentially  a  set  of  triples  giving  the  three  dimen¬ 
sional  positions  of  the  points  in  relation  to  each  other,  fundamental  to  Ullman’s  elegant  analysis  is  his 
"structure  from  motion"  theorem  which  states  that  the  structure  of  four  non-coplanar  points  in  a  rigid 
configuration  is  recoverable  from  three  orthographic  projections. 

An  example  of  the  “continuous  approach"  to  the  problem  can  be  found  in  1  onguct- Higgins  and 
I’ra/dny  1 1  ‘>80).  ‘  Rather  than  utilizing  discrete  orthographic  projections  as  input,  they  assume  a 
velocity  lield  arising  from  a  perspective  projection.  The  resulting  description  computed  involves  sur¬ 
faces  instead  of  sets  of  triples.  In  short  they  prove  that  given  the  perspective  projection  and  first  and 
second  spatial  derivatives  of  lire  velocity  field  presented  to  a  moving  observer  it  is  in  principle  possible 
to  compute  both  the  observer's  motion  and  the  surface  gradients  at  each  point  in  the  visual  field. 

The  present  analysis  falls  into  the  continuous  category.  Mow  fields  arc  assumed  as  the  input  and  a 
description  of  the  surface  of  interest  in  terms  of  the  surface  normal  (slant,  tilt)  at  each  v  isible  point  is 
the  desired  result.  Where  the  current  analysis  differs  from  that  of  1  onguct-1  liggins  and  Pra/dny  and 
other  previous  work  within  the  continuous  approach  is  that  here  orthographic  projection  is  assumed 
instead  of  perspective  projection.  Consequently  in  this  analysis  it  proves  impossible  to  derive  both 
the  observer's  motion  and  a  complete  surface  description  merely  from  the  velocity  field  and  its  spatial 
derivatives.  I  he  relations  of  these  various  approaches  is  summari/ed  in  figure  3. 


3.  Why  Use  Orthographic  Projection? 

Why  bother  pci  forming  a  computational  analysis  of  the  problem  assuming  orthographic 
projection?  After  all  it  will  be  shown  that  less  information  about  local  surface  properties  can  be 
computed  from  the  velocity  field  in  orthographic  projection  than  in  perspective.  Specifically,  surface 
slant  computation  requires  the  temporal  derivative  of  the  velocity  field.  There  arc  several  motivations. 

I'irst,  as  Ullman  (1979)  points  out,  perspective  effects  arc  often  rather  noisy  and  unreliable.  To 
utilize  them  locally  would  require  very  careful  measurements  by  the  visual  system. 

Second,  orthographic  projection  provides  a  good  local  approximation  to  live  actual  retinal  projec¬ 
tion.  A  theorem  from  dillcrcntial  topology  allows  us  to  conclude  that  whatever  the  true  retinal 


*  Several  other  researcher',  have  examined  aspects  of  this  problem  from  a  continuous  point  of  view  (KoerutcrinV  A  Van 
Doom.  W?f>;  Naka>aina  &  I  norms.  1974;  Gibson,  1950) 
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Figure  3.  A  categorization  of  ihc  various  compuiational  approaches  lo  Ihc  problem  of  deriving  shape  from 
motion  Ihc  categorization  scheme  is  given  by  crossing  projection  type  (orthographic  or  perspective)  with 
motion  type  (discrete  frames  versus  optical  (low). 

projection  is,  it  is  locally  equivalent  to  orthographic  projection.5 

A  third  motivation  is  provided  by  the  results  of  some  psychophysical  tests  done  by  Ullman.  Using  a 
cylinder  composed  of  random  dots  he  showed  that  observers  can  recover  the  correct  structure  entirely 
from  the  orthographic  projection  of  the  motion  of  the  dots  when  the  cylinder  is  rotated  about  its 
axis.  However  observers  cannot  recover  the  structure  under  perspective  projection  when  the  object 
is  alternately  receding  and  approaching  without  any  rotation.  Ihis  is  significant  because  a  computa¬ 
tional  analysis  shows  that  if  perspective  effects  are  taken  into  account  the  structure  can  in  principle  be 
recovered  from  receding  and  approaching  motion  alone.  These  results  tend  to  support  the  psychologi¬ 
cal  reality  of  a  computational  theory  based  on  a  locally  orthographic  projection  for  the  recovery  of 
shape  from  motion. 

Alternate  computational  analyses  provide  clear  candidate  hypotheses  that  may  be  tested  for  their 
psychological  reality  and  that  each  lend  different  insights  into  the  subject  of  study.  For  example  it 
will  be  shown  later  that  the  tilt  component  of  the  surface  normal  is  much  more  easily  recovered  than 
(he  slant  component,  both  in  the  nature  of  the  motion  information  required  and  the  computations  in- 

r'lbc  theorem  is  called  ihc  Ixicul  Submersion  Theorem  (see,  for  example,  Guillcmin  &  Pollack  (1974))  It  slates,  "Suppose 
lhai  /:  V  >  -•  )  is  a  submersion  ai  x,  and  y  —  /(. r)  then  (here  exists  local  coordinates  around  x  and  y  such  that  for 
k  >  I,  /( j| xk)  —  (i| ay)  lTiat  is,  /  is  locally  equivalent  to  Ihc  canonical  submersion  near  x." 
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solved.  This  is  an  interesting  result  and  one  that  could  provide  a  basis  for  psychophysical  examination 
of  the  psychological  reality  of  this  analysis. 

finally,  the  equations  for  surface  orientation  and  motion  derived  using  orthographic  projection 
are  much  simpler  than  those  derived  under  perspective  projection.  Not  only  arc  the  the  equations 
simpler,  they  do  not  require  measurements  of  the  seaonl spatial  derivatives  of  the  velocity  Held  as  is 
typical  in  the  perspective  case. 


4.  (ieometrical  Model 

The  idealized  geometry  underly  ing  the  following  computational  analysis  is  illustrated  in  figure  4.  A 
rigid  patch  of  surface.  .9.  is  considered  to  he  an  open  set  of  points  each  of  which  has  an  associated 
position  vector  R.  The  position  vector  foi  a  point  on  S  with  respect  to  the  x.  y,  z  coordinate  system  is 
given  by: 

R  —  xi  -|-  t/j  -J-  z(x,  y)k  (1) 

where  i.  j,  k  are  unit  vectors  along  the  x,  y.  z  axes  respectively.  The  surface.  S,  has  an  angular  velocity 
12  given  by: 

12  — -  ui|i  j~  j  u>|k  (2) 

with  respect  to  the  x.  y.  z  coordinate  system. 

Note  that  12  may  either  result  from  rotary  motions  of  the  surface  or  from  mov  ement  of  the  image 
plane,  L  with  respect  to  or  both.  As  long  as  12  is  not  zero  it  doesn't  matter  whether  die  surface  is 
rotating  and  the  observer  remains  stationary  or  whether  the  surface  is  stationary  and  the  observer’s 
motion  with  respect  to  the  surface  includes  an  angular  component. 

Associated  with  S  is  a  velocity  vector  field.  V.  which  at  any  point  ;>e  S  is  given  by: 

V  =  12  X  R  +  T  (3) 

where  T  is  any  net  translation  between  the  observer  and  the  surface.8 

The  velocity  field  available  to  die  observer  is  an  orthographic  (parallel)  projection  of  die  velocity 
field,  V,  associated  with  .S’  onto  the  image  plane,  /. 

Now  this  is  clearly  an  idealization.  The  real  observer  is  definitely  not  given  a  velocity  field  but 
must  construct  such  a  field  from  the  temporally  changing  optic  array .  This  problem  w  ill  be  discussed 
briefly  later,  l  or  the  analysis  of  the  present  problem  of  inferring  die  shape  of  S.  the  ortliographically 
projected  velocity  field  is  assumed  as  a  given. 

With  this  simple  geometrical  model  as  background  the  computational  analysis  of  the  problem  of 
inferring  shape  from  ortliographically  projected  motion  is  now  presented  as  die  proof  to  the  following 
proposition. 


®  Actually  T  is  an>  net  translation  between  the  observer  and  ihc  mu  of  rotation  of  the  surface  However,  the  translation 
term  is  of  no  consequence  for  the  present  analysis  since  K  will  drop  oul  when  the  spatial  derivatives  of  (3)  arc  taken. 
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5.  Till'  Shape  from  Motion  Proposition 

Ohm  the  Jim  spatial  Jem,'  nes  of  the  orthographical!}  /•»  <  >/<  •<  /< ./  •■ehuits  and  acceleration  fields  of 
a  rightly  whiling  regular  surface,  the  angular  velocity  and  surface  not  mu!  at  each  xnthle  point  on  the 
surface  are  uniquely  determined  up  to  a  reflection  about  the  image  plane. 

Proof  The  proof  of  this  proposition  involves  Jem  inn  oi_ju.it  ions  lor  the  in  o  components  (a,  r)  of 
the  surface  normal.  V.  at  each  \ isiltle  point  atkl  lor  (lie  three  components  of  the  angular  velocity  (u>|, 
u uq).  Tor  clarity  of  presentation  the  proof  is  divided  into  two  lemmas.  In  the  fust  lemma  equations 
for  the  tilt,  r,  and  one  component  of  the  angular  velocity.  ay,.  are  denied  anj  discussed.  In  the  second 
lemma  the  same  is  done  for  the  slant,  o.  and  the  remaining  two  components  of  the  angular  velocity, 
uj|  and  u >2- 


5.1  I  enuna  I. 

Ilotli  the  tilt.  t.  at  each  visible  point  on  >’  and  th,  conip, aunt  of  angular  ulneity  about  the  axis 
orthogonal  to  the  image  plane,  at. j.  are  eotnputabh  gtun  only  tin  lirsi  spatial  derivatives  of  the  or- 
thograpleealls  puyi  tied  velocity  field 

lo  make  the  claims  of  this  lemma  elearei  figure  5  ill usu ales  the  till  fields  associated  vvith  two 
simple  surfaces  and  figure  I  illustrates  with  which  axis  the  angular  velocity  component,  uq,  is  as¬ 
sociated. 

/’roof  of  / i  nuiiti  /.  Since  the  projection  plane.  /.  is  orthogonal  to  the  unit  vector,  k.  die  or¬ 
thographic  projection  of  the  velocity  field.  V*.  is  given  by:' 

V*  V  (V  k)k  (4) 

What  this  essentially  means  is  that  the  components  of  the  velocity  held  along  the  x  and  y  axes  survive 
orthographic  projection  unaltered,  whereas  the  component  along  the  ;  avis  ti  e.,  along  the  observer’s 
line  of  sight)  is  eliminated  completely .  Consequently  the  mill  spatial  derivatives  of  the  velocity  field 
that  need  he  computed  are  along  the  x  and  y  directions.  I  Vnoting  spatial  partial  derivatives  by 
subscripts,  the  first  spatial  derivatives  of  the  velocity  field  (equation  .1)  along  the  x  and  y  axes  are: 

V,  —  Uj.  \  R  j-  n  X  Rr  (5) 

V(/  =--  fly  X  R  f-  0  X  Ry  (6) 

Before  investigating  equations  (5)  and  (6)  further  it  is  helpful  to  introduce  a  mathematical  expres¬ 
sion  for  the  rigidity  constraint  that  will  allow  these  equations  to  he  simplified.  The  motivation  for  the 
p.iiticular  mathematical  expression  to  be  used  here  is  simple.  One  consequence  of  surface  rigidity  is 
that  the  entire  surface  can  have  only  one  angular  velocity.  U.  Regardless  of  which  neighborhood  of 

'  tins  chai;icU'u/;it:on  of  ihc  oithoeuplnc  projection  of  a  vector  is  borrowed  from  Wilkin  (1980) 
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figure  5.  1  ill  Helds  compared  with  fields  of  surface  normals  for  two  surfaces.  Accor  mg  to  lemma 
1  one  can  obtain  the  correct  tilt  field  ( 1,  r)  from  the  velocity  field  but,  unfortunately,  not  the  field  of 
surface  normals  (a.  r).  A  tilt  field  is  an  example  of  a  field  of  directions  (Do  Carmo,  1976,  p.  1 78).  Since 
no  magnitude  infonnation  is  known,  only  the  direction  of  tilt,  the  tilt  fields  in  (a)  and  (b)  arc  indicated 
by  constant  length  vectors  pointing  in  the  direction  of  till,  lire  surfaces  arc  (a)  a  sphere  and  (b)  a 
cylinder. 
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thi'  sin  lace  is  examined  unit  one  value  I'm  U  will  Iv  returned.  I  his  is  otjui1. alcnt  to  stating  llial  the 
sp.it  ial  domain  os  of  the  angular  vehc  ii>  along  the  sni  I.ko  ho  zero: 

II,  12„  W.  0  (7) 

This  constraint  allows  us  to  climmato  the  lust  ic  n-.  on  'ho  light  hand  sides  ol  equations  ' 5 )  and  (6). 
I  ho  simplified  expressions  are. 

V,  12  s  R,  (8) 

V,,  W  -  R,(  (9) 

where 

R,  i  |  z> k  (10) 

R„  ,i  t  «,>  (11) 

are  the  surface  gradient  rectors  u hose  i  loss  pioduot  is  tlu  sun  ace  in >i mal  at  a  given  point  pt  S. 

I  o  simplify  notation,  let  u  V  i  and  r  V  i  \s  Ivforc  let  spatial  partial  derivatives  he  denoted 
bv  subscripts.  I  aider  tins  notation  the  vclocih  held  .oailahle  to  the  obscrvei  is  a  set  ol  pairs,  (u.  v). 
u  here  u  is  the  /  coniponent  and  v  the  y  component  ol  the  \elocit>  u\  loi  at  a  given  point  in  the  field. 
I  he  lii si  spatial  derivatives  of  the  veloeitv  field,  which  we  wish  to  prove  are  suflicient  to  compute  r 
and  tv,,  heroine  simple  u,  u,r  v.  and  r,,.  I  hose  aie  illusiiated  in  figure  f>. 

Ilv  expanding  the  indicated  cross  products  in  equations  (,S)  and  (9)  and  writing  out  tile  i  and  y 
components  of  the  results  as  separate  equations,  we  at i  ix o  at  (lie  lour  basic  equations  from  which  r 
anduzi  will  he  derived: 


Uj. 

(12) 

V.r  -  --  U)  |  UZ|  Zr 

(13) 

Uy  -=  --  LVj 

(14) 

t  ly -  lC|  Zlj 

(15) 

I  quations  ( 1  ?.)  (15)  relate  four  quantities  which  max  in  principle  he  measured  from  the  image.  (u„ 
ujr  vr,  !•,,).  to  the  two  components  of  the  local  surface  normal,  (a.  r),K  and  the  three  components  of 
the'  angular  veloeitv.  (u>i .  u)^.  u)|).  Since  we  have  foui  equations  and  foe  tin  know  ns.  the  surface  normal 
and  angular  veloeitv  arc  iindcrdctermineri.  I  low  ever,  w  e  can  solve  lor  uq.  r,  and  u;(  /w?: 

Vj  —  »<.,  i  x/(«.,  I  v,Y  -  AusVy 

wj  = -  --  do) 


t  "  tan 


(17) 


(t>|  ix)\  —  Vx 
Ct>2  Ur 


(18) 


which  concludes  the  proof  of  lemma  1. 

K Ac n/.illv  the  suitin'  ijnmi.il  is  expressed  in  Icons  nf  r,  and  r,  m  e-niaiimis  (1?)  (IS)  live  transformations  to  slant, 
n.  and  till.  r.  aie  given  m  figure  ? 
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dujdx  ~  ((u.)  —  u2)  -f  (uj  —  u<))!/2d  Ou/dy  ~  |(it2  —  t^i)  -f  (u,  —  u,)]/2d 

~  K**  “  ^  ~  [(t>2  -  Ml)  +  (v,  -  v1)]/2rf 

Figure  6.  The  orlhographically  projected  velocity  field  and  its  first  spatial  derivatives,  (a)  illustrates 
the  decomposition  of  a  velocity  vector  at  a  point  in  the  field  into  its  x  component,  u,  and  its  y 
component,  v.  (b)  illustrates  how  the  spatial  derivatives  us,  uu.  vx  and  vu  can  be  approximated  at 
some  point,  p,  from  the  local  velocity  field. 
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§.2  Remarks  on  lemma  1. 

An  important  problem  for  the  computational  investigation  of  early  vision  is  the  initial  carving  up  of 
the  visual  array  into  tentative  objects  and  a  background.  This  is  important  because  it  is  a  fundamental 
contention  of  the  bottom  up  computational  approach  that  there  exist  autonomous  low  level  visual 
processes  capable  of  providing  a  useful  initial  segregation  of  the  visual  world  independent  of  higher 
level  cognitive  influences,  l  or  example,  it  is  a  primary  goal  of  the  primal  sketch  and  2-^1)  sketch, 
early  visual  representations  proposed  by  Mart  (1976)  and  Marrand  Nish  ihara  ( 3  978 ),  to  make  explicit 
exactly  that  information  in  a  v  isual  image  which  is  required  to  build  useful  descriptions  of  the  image 
in  terms  of  objects  and  their  relations.  The  processes  proposed  both  to  build  and  to  operate  upon 
these  representations  are  invariably  bottom  up.  If  this  endeavor  fails,  so  too  docs  much  of  the  com¬ 
putational  approach  to  vision.  Therefore  a  high  priority  activity  of  computational  research  in  vision  is 
to  provide  convincing  existence  proofs (e.g.,  running  computer  programs)  for  this  contention. 

Visual  motion  seems  a  likely  candidate  base  for  tentative  structuring  of  the  visual  array  via 
autonomous  processes.  This  has  been  suggested  many  times  before.  Ullman  (1979.  p.  76)  proposes 
that  a  primitive  motion  correspondence  process  might  be  causally  related  to  die  child’s  acquisition 
of  object  constancy  over  changing  views  of  an  object.  Man  and  Ullman  (1979)  have  suggested  that 
retinal  velocity  fields  may  he  used  to  segregate  the  visual  world  by  exploiting  the  "principle  of  con¬ 
tinuous  flow".  I  his  principle  states  that  "the  velocity  field  of  motion  w  ithin  the  image  of  a  rigid  object 
v  aries  continuously  almost  everywhere." 

The  results  of  lemma  1  suggest  four  further  motion  based  segregation  methods.  The  first  two  arise 
again  from  die  fact  that  a  rigid  body  can  have  but  one  angular  velocity  at  any  instant.  Since  lemma  1 
provides  methods  to  compute  te.j  and  ui|/u>2  locally,  it  is  possible  to  segregate  the  field  into  regions  of 
constant  w.t  and  constant  u>i/u In  fact,  the  segregations  obtained  by  the  two  methods  should  agree, 
prov  idmg  the  necessary  redundancy  to  check  for  gross  errors. 

I  he  third  method  is  based  on  noting  diat  the  discriminant  of  equation  (16)  foruo  remains  real  over 
regions  in  the  image  which  arc  die  projections  of  smooth  rigid  surfaces.0  Therefore  points  where  u>3 
becomes  complex  indicate  regions  in  the  image  where  the  rigidity  constraint  is  violated  (or  where  the 
surface  h.is  a  discontinuity  from  die  current  viewpoint). 

finally,  wc  can  utilize  constraints  on  tilt  fields,  lor  smooth  rigid  surfaces  a  "principle  of  continuous 
tilt"  analogous  to  that  proposed  by  Marr  and  Ullman  for  optical  flow  may  be  invoked  to  segregate 
the  visual  array.  Ibis  principle  states  that  "the  tilt  field  within  the  image  of  a  rigid  object  varies 
continuously  almost  everywhere." 

These  four  segregation  techniques  are  not  isomorphic  to  Marr  and  I J liman's  principle  of  con¬ 
tinuous  flow,  (lie  methods  suggested  here  segregate  the  image  into  regions  which  arc  the  projections 
of  rigid  objects  The  principle  of  continuous  flow  cannot.  Since  it  does  not  explicitly  incorporate  a 
rigidity  constraint.11’  the  principle  of  continuous  flow  cannot  be  used  to  distinguish  regions  of  smooth 

’’This  is  easily  proved  by  substituting  from  equations  (I?)  (15)  inlo  ihe  appropriate  icons  of  the  discriminant  of  (16) 
Simplifying  gives  which  is  always  greater  than  zero  Implicit  in  equations  (17)  (IS)  is  the  ngidity 

assumption 

lnlhc  word  rigid  docs  appear  m  ihe  statement  of  their  principle,  hoi  it  is  equally  true  that  "the  velocity  field  of  motion 
within  the  image  of  a  beij.n^  object  varies  continuously  almost  everywhere" 
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flow  in  the  image  which  arise  from  rigid  objects  from  those  which  arise  from  bending  or  otherwise 
non-rigid  substances.  Consequently  the  segregations  provided  by  the  different  methods  are  not  identi¬ 
cal  but  arc  useful  for  different  purposes. 


5.3  Lemma  2. 

The  surface  slant,  o.  and  the  remaining  two  components  of  the  angular  velocity.  u>t  and  u^,  are 
computable  given  the  spatial  derivatives  of  the  orthograph  tcally  projected  acceleration  field  in  addition  to 
those  of  the  velocity  field. 

Proof  of  I  ernma  2.  The  acceleration  field  associated  with  a  smooth  rigid  surface  is  found  by  taking 
the  time  derivative  of  equation  (3).  Indicating  temporal  derivatives  h>  primes  we  have: 

V'  =  ft'  X  R  +  fi  X  R'-f  V  (19) 

where 

R1  =  fl  X  R  =  V  (20) 

O'  =  u/,i  +  w'jj  4  u/3k  (21) 

If  we  take  the  first  spatial  derivatives  of  (19),  simplify  the  results  using  the  rigidity  constraint  of  (7), 
and  expand  the  indicated  cross  products  as  before,  we  obtain  the  four  equations: 


u'z  ~  LO'jZf  —  —  w]  -f  LO-jWlZj 

(22) 

u'y  —  U2Zy  —  u/3  -f  uipai  -f 

(23) 

l/z  ~  Wj  —  4  U>2U>jZj.  4  WlW2 

(24) 

v'y  —  —d\Zy  4-  0>i<jt)Zy 

(25) 

Equations  ( 1 2)— <  15)  and  (22H25)  relate  eight  quantities  measurable  in  principle  from  the  image, 
(tts.  uv.  vx.  vy.  u'z.  u’y.  v'z.  v'y),  to  the  eight  unknowns  of  interest:  the  local  surface  normal,  (a,  r), 
the  three  components  of  the  angular  velocity,  (wj,  u>2.  wti).  and  die  three  components  of  the  angular 
acceleration.  (u/,t  u/2.  <Jf).  Ihc  simple  fact  that  we  have  eight  equations  in  eight  unknowns  docs  not 
necessarily  imply  that  this  system  has  but  a  finite  number  of  solutions.  To  ascertain  if  dicrc  arc  a  finite 
number  of  solutions  we  apply  the  inverse  function  theorem."  This  theorem  allows  us  to  conclude 

"f  or  an  informal  discussion  of  ihc  utility  of  the  inverse  funriion  theorem.  Bc/out’s  theorem,  and  Sard's  theorem  for 
problems  involving  systems  of  nonlinear  equations  sec  Riehaids.  Rubin,  and  llolTman  (1981). 
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th.it  wherever  the  Jacobian  of  these  equations  is  nonsmgular  the  mapping  defined  b>  the  equations  is 
loe.il!>  one  to  one  and  onto  tie  a  local  dillcomorphism)  t  oiisequciuly  any  roots  at  points  where  the 
Jacobian  is  nonsmgular  aie  isolated  and  not  part  of  a  continuum  of  solutions.  The  determinant  of  the 
l.icobian  of  ( I  2)  (15)  and  ( 22)  (25)  is: 


uf) 

0 

0 

Zt 

0 

0 

0 

0 

a>l 

0 

—  z. 

0 

1 

0 

0 

0 

0 

0 

Zy 

-1 

0 

0 

0 

0 

—  ut| 

-zy 

0 

0 

0 

0 

0 

uJ|La( 

0 

Ul|2f 

—  2u>2 

Ld[ Zs  -  -  2u 

0 

0 

0 

W|Lilj  f  UJj 

ce  1^,,  -f-  u. >2 

Ul| 

U\Zy 

0 

Zy 

-1 

j 

0 

ULti 

u>l2j  -f-  uJ| 

u»i'/ 

—  2; 

0 

1 

0 

\JL>IU)  {  —  Cl/ | 

-  2lci 

, , 

—  2u>j 

-Zy 

0 

0 

I  Ins  Jacobian  lias  i  .ink  eight  Consequently  the  system  of  equations  has  but  a  finite  set  of  solutions 
in  geneial.1*  Its  Ite/oul's  theorem11  we  know  that  the  sum  of  the  mulliplieilies  of  the  solutions  docs 
not  exceed  the  pioduct  ol  the  degrees  ol  the  equations. 


We  lias e  show  n  that  there  are  but  a  finite  nunibei  of  solutions  gnen  the  spatial  derivatives  of  the 
velocity  and  acceleration  fields  ,u  out  /«»»//.  in  fact  (IT)  1 1 5 )  and  (22)  (25)  can  be  solved  uniquely  (up 
to  a  rellecl ion)  for  n,  u»i.  and  u>2  in  terms  of  a/,: 

o  — -  tali  (26) 

^i  —  ±(cr  —  /3  ~ 

-  T[Uy  +  -  -T (28) 


where 

o  ---  K  +  «^)K  -  O  -  K  +  +  <)  (29) 

P  —  (u^i  —  v,)2(w2,  +  «x)  -I-  »cK  +  w'„)(wi  —  Tr)  +  f  v'y)  (30) 

i  (“d  I  w'„)(w.i  f-  'hi)2  -'vK  i  >0(^1  f  nn)  I  •+  ’*,)  (  J|) 


I  his  concludes  the  proof  ol  lemma  2  and  of  the  shape  from  motion  proposition. 
1  ^Degenerate  conditions  can  be  found  bv  deletnnning  when  line  determinant  is  zero 
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5.4  Remarks  on  (tic  Shape  From  Motion  Proposition 

li  has  been  shown  that  the  angular  velocity  ami  the  sm  (ace  normal  at  each  visible  point  of  a  rigidly 
rotating  regular  surface  are  uniquely  determined  up  to  a  reflection  if  one  is  given  the  orthographic 
projection  and  first  spatial  derivatives  of  the  associated  velocity  and  acceleration  iiclds.  litis  proposi¬ 
tion  and  its  proof  are  proposed  as  the  basis  for  a  computational  theory  of  the  human  visual  ability  to 
perceive  the  shape  of  a  smooth  moving  surface  from  its  motion  alone. 

Some  disclaimers  arc  in  order.  First,  only  arguments  for  the  sufficiency  of  this  approach,  not  its 
necessity,  have  been  suggested.  Alternative  computational  theories  are  available,  some  of  which  were 
discussed  earlier.  It  is  a  matter  '‘nr  empirical  investigation  to  determine  which,  if  any.  of  die  current 
theories  is  to  some  extent  psychologically  real. 

I  wo  pieces  of  psychophysical  evidence  may  he  adduced  to  suggest  the  greater  psychological  reality 
of  the  piescnt  approach  over  previous  ones.  First  are  Chilians  ( l‘)7())  experiments,  mentioned  before, 
which  indicate  that  only  (lithographic,  not  perspective,  information  seems  to  be  utilized  by  the  visual 
system  in  rccovciing  surface  shapes.  The  second  is  that  the  resulting  perceptual  effect  (illustrated 
in  figuie  I)  is  of. i  smooth  surface  as  opposed  to  isolated  points  connected  by  invisible  wiies.  This 
suggests  greater  psychological  reality  for  an  approach  which  builds  a  description  whose  primitives 
relate  to  surfaces. 

\  second  disclaimer  must  he  mentioned.  The  visual  system  may  utilize  additional  generally  valid 
constraints  for  the  interpretation  of  surface  shapes  from  motion,  l  or  example,  shortcuts  in  computing 
the  slant,  o.  might  be  based  on  noting  that  a  must  be  ‘HI  degrees  at  external  occluding  contours 
and  must  vary  smoothly  between  them.  Another  potentially  powerful  constraint  is  that  the  tilt  field 
must  be  locally  orthogonal  to  the  image  of  its  occluding  contour  (for  smooth  surfaces).  I  bus  further 
investigation  01  valid  means  to  reduce  the  computational  complexity  of  this  approach  is  warranted 
before  serious  ci. dins  for  its  psychological  reality  can  be  sustained. 


6.  Computing  Velocity  Fields 

I  o  this  point  the  analysis  has  assumed  the  velocity  field  and  its  fust  spatial  derivatives  arc  given. 
Clearly  this  is  not  the  case  for  a  real  observer.  Hie  real  observer  is  presented  with  a  temporally 
changing  optic  arrjy.  If  a  velocity  field  is  required  it  must  he  c»n\iuui<\l  hum  the  changing  optic 
array. 

I  he  problem  of  computing  a  velocity  field  has  remained  nontrivial  despite  much  recent  research. 
One  can  show  that  the  motion  information  available  at  any  single  point  in  a  changing  optic  array  is 
insufficient  to  uniquely  determine  the  velocity  field  at  that  point.  Consequently  much  of  the  research 
in  the  field  of  optical  flow  has  been  devoted  to  discovering  valid  means  of  integrating  motion  informa¬ 
tion  from  local  neighborhoods  to  uniquely  determine  the  flow  at  each  point  in  the  neighborhood. 

A  detailed  analysis  of  the  problem  of  determining  optical  flow  is  presented  in  Horn  and  Selninck 
( 1 1>80).  v.  Inch  also  includes  a  representative  list  of  references  on  the  topic. 

1  ’Actually  only  (lie  first  spali.il  derivatives  have  been  used 
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7.  Summary 

A  computational  analysis  of  the  human  visual  ability  to  infer  surface  shapes  entirely  from  their  mo¬ 
tion  has  been  presented.  The  analysis  proceeded  in  three  main  steps.  Kirst  it  was  shown  that  surface 
till,  r,  and  the  component  of  angular  velocity  orthogonal  to  the  image  plane,  urj,  may  be  derived  from 
just  the  spatial  derivatives  of  die  velocity  field  (assuming  orthographic  projection).  Then  it  was  shown 
that  surface  slant,  o.  and  the  two  components  of  angular  velocity  lying  parallel  to  the  image  plane,  tui 
and  u>2,  are  computable  if  the  first  spatial  derivatives  of  the  acceleration  field  are  also  available.  Finally 
die  problem  of  computing  velocity  fields  from  changing  optic  arrays  was  discussed  briefly. 
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